79 research outputs found
Trilogy on Computing Maximal Eigenpair
The eigenpair here means the twins consist of eigenvalue and its eigenvector.
This paper introduces the three steps of our study on computing the maximal
eigenpair. In the first two steps, we construct efficient initials for a known
but dangerous algorithm, first for tridiagonal matrices and then for
irreducible matrices, having nonnegative off-diagonal elements. In the third
step, we present two global algorithms which are still efficient and work well
for a quite large class of matrices, even complex for instance.Comment: Updated versio
Random Surfing Without Teleportation
In the standard Random Surfer Model, the teleportation matrix is necessary to
ensure that the final PageRank vector is well-defined. The introduction of this
matrix, however, results in serious problems and imposes fundamental
limitations to the quality of the ranking vectors. In this work, building on
the recently proposed NCDawareRank framework, we exploit the decomposition of
the underlying space into blocks, and we derive easy to check necessary and
sufficient conditions for random surfing without teleportation.Comment: 13 pages. Published in the Volume: "Algorithms, Probability, Networks
and Games, Springer-Verlag, 2015". (The updated version corrects small
typos/errors
Semantic distillation: a method for clustering objects by their contextual specificity
Techniques for data-mining, latent semantic analysis, contextual search of
databases, etc. have long ago been developed by computer scientists working on
information retrieval (IR). Experimental scientists, from all disciplines,
having to analyse large collections of raw experimental data (astronomical,
physical, biological, etc.) have developed powerful methods for their
statistical analysis and for clustering, categorising, and classifying objects.
Finally, physicists have developed a theory of quantum measurement, unifying
the logical, algebraic, and probabilistic aspects of queries into a single
formalism. The purpose of this paper is twofold: first to show that when
formulated at an abstract level, problems from IR, from statistical data
analysis, and from physical measurement theories are very similar and hence can
profitably be cross-fertilised, and, secondly, to propose a novel method of
fuzzy hierarchical clustering, termed \textit{semantic distillation} --
strongly inspired from the theory of quantum measurement --, we developed to
analyse raw data coming from various types of experiments on DNA arrays. We
illustrate the method by analysing DNA arrays experiments and clustering the
genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence,
Springer-Verla
The intellectual influence of economic journals: quality versus quantity
The evaluation of scientific output has a key role in the allocation of
research funds and academic positions. Decisions are often based on quality indicators
for academic journals, and over the years, a handful of scoring methods have
been proposed for this purpose. Discussing the most prominent methods (de facto
standards) we show that they do not distinguish quality from quantity at article level.
The systematic bias we find is analytically tractable and implies that the methods are
manipulable. We introduce modified methods that correct for this bias, and use them
to provide rankings of economic journals. Our methodology is transparent; our results
are replicable
Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
<p>Abstract</p> <p>Background</p> <p>Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general but common situation of <it>k</it>-partite graphs. These graphs contain <it>k </it>different node types and links are only allowed between nodes of different types. In order to reveal their structural organization and describe the contained information in a more coarse-grained fashion, we ask how to detect clusters within each node type.</p> <p>Results</p> <p>Since entities in biological networks regularly have more than one function and hence participate in more than one cluster, we developed a <it>k</it>-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a weighted <it>k</it>-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were able to structure this graph on a large scale into clusters that are functionally correlated and biologically meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified this for the transcription factor MECP2.</p> <p>Conclusions</p> <p>In order to cope with the overwhelming amount of information available from biomedical literature, we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, we presented a novel fuzzy <it>k</it>-partite graph partitioning algorithm that allows the decomposition of these objects in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily applicable to any further problem.</p
Hidden dynamics of soccer leagues: the predictive âpowerâ of partial standings
Objectives Soccer leagues reflect the partial standings of the teams involved after each round of competition. However, the ability of partial league standings to predict end-of-season position has largely been ignored. Here we analyze historical partial standings from English soccer to understand the mathematics underpinning league performance and evaluate the predictive âpowerâ of partial standings. Methods Match data (1995-2017) from the four senior English leagues was analyzed, together with random match scores generated for hypothetical leagues of equivalent size. For each season the partial standings were computed and Kendallâs normalized tau-distance and Spearman r-values determined. Best-fit power-law and logarithmic functions were applied to the respective tau-distance and Spearman curves, with the âgoodness-of-fitâ assessed using the R2 value. The predictive ability of the partial standings was evaluated by computing the transition probabilities between the standings at rounds 10, 20 and 30 and the final end-of-season standings for the 22 seasons. The impact of reordering match fixtures was also evaluated. Results All four English leagues behaved similarly, irrespective of the teams involved, with the tau-distance conforming closely to a power law (R2>0.80) and the Spearman r-value obeying a logarithmic function (R2>0.87). The randomized leagues also conformed to a power-law, but had a different shape. In the English leagues, team position relative to end-of-season standing became âfixedâ much earlier in the season than was the case with the randomized leagues. In the Premier League, 76.9% of the variance in the final standings was explained by round-10, 87.0% by round-20, and 93.9% by round-30. Reordering of match fixtures appeared to alter the shape of the tau-distance curves. Conclusions All soccer leagues appear to conform to mathematical laws, which constrain the league standings as the season progresses. This means that partial standings can be used to predict end-of-season league position with reasonable accuracy
- âŠ